A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
نویسنده
چکیده
Subspace clustering is an emerging task that aims at detecting clusters in entrenched in subspaces. Recent approaches fail to reduce results to relevant subspace clusters. Their results are typically highly redundant and lack the fact of considering the critical problem, “the density divergence problem,” in discovering the clusters, where they utilize an absolute density value as the density threshold to identify the dense regions in all subspaces. Considering the varying region densities in different subspace cardinalities, we note that a more appropriate way to determine whether a region in a subspace should be identified as dense is by comparing its density with the region densities in that subspace. Based on this idea and due to the infeasibility of applying previous techniques in this novel clustering model, we devise an innovative algorithm, referred to as DENCOS(DENsity Conscious Subspace clustering), to adopt a divide-and-conquer scheme to efficiently discover clusters satisfying different density thresholds in different subspace cardinalities. DENCOS can discover the clusters in all subspaces with high quality, and the efficiency significantly outperforms previous works, thus demonstrating its practicability for subspace clustering. As validated by our extensive experiments on retail dataset, it outperforms previous works. We extend our work with a clustering technique based on genetic algorithms which is capable of optimizing the number of clusters for tasks with well formed and separated clusters.
منابع مشابه
Feature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملAn Efficient and Fast Density Conscious Subspace Clustering using Affinity Propagation
Subspace clustering is an eminent task to detect the clusters in subspaces. Density-based approaches assume the high-density region in the subspace as a cluster, but it creates density divergence problem. The proposed work improves the performance of Density Conscious subspace clustering (DENCOS) by utilizing the Affinity Propagation (AP) algorithm to detect the local densities for a dataset. I...
متن کاملComparison of Genetic and Hill Climbing Algorithms to Improve an Artificial Neural Networks Model for Water Consumption Prediction
No unique method has been so far specified for determining the number of neurons in hidden layers of Multi-Layer Perceptron (MLP) neural networks used for prediction. The present research is intended to optimize the number of neurons using two meta-heuristic procedures namely genetic and hill climbing algorithms. The data used in the present research for prediction are consumption data of water...
متن کاملSolving a Stochastic Cellular Manufacturing Model by Using Genetic Algorithms
This paper presents a mathematical model for designing cellular manufacturing systems (CMSs) solved by genetic algorithms. This model assumes a dynamic production, a stochastic demand, routing flexibility, and machine flexibility. CMS is an application of group technology (GT) for clustering parts and machines by means of their operational and / or apparent form similarity in different aspects ...
متن کاملDensity Conscious Subspace Clustering for High Dimensional Data using Genetic Algorithms
Clustering has been recognized as an important and valuable capability in the data mining field. Instead of finding clusters in the full feature space, subspace clustering is an emergent task which aims at detecting clusters embedded in subspaces. Most of previous works in the literature are density-based approaches, where a cluster is regarded as a high-density region in a subspace. However, t...
متن کامل